BD 137 399 IB 006 205 

ADIHOl Eaffeia^ Paul| S#jsolds, William 11, 

I13?L1 InGraasing thm Efficiency of JPretest-Posttest 

Designs • 

PDB DATE £ Apr 77 J 

9p,l Paper presentea at tha Annual Meeting of thm 
American idncational Eeaearch Association {61tit^ New 
Xoirk^ New York, April a^8^ 1977) 



NOTE 



Ems SEICE MF^$0*83 HC^$1*67 Plus Postage, 

DBSCEXFTOES Control Groap;s; Correlation; Experimental Groupa; 

Hypothesis Testing; Maasureffiant Techniguesi *Post 
Teatingi *prategting; *Sasaarch Design iSanpling; 
^Statistical Analysi^i Test laliability ; *Tests of 
Signlficanca 

ABSTfiACT V 

Tie pratest-posttest design referred to as Design 2 
hj CampJbell and Stanley (1963) is commonly nsed in educational 
raaaarch and evaluation* Tie tenaMHty of the assumption of a zero 
population difference commonly used with this design is guestioned^. A 
nonzero population estimate based on the mean difference observed in 
test-ratest reliability data is recommended* Hhen a control group is 
available. It is recommended that the pretast-posttast difference for 
the control group ba subtracted from the axparimental group 
difference^ This will produce a more accurate estimatp of the 
magnitude of changa for the experimental group, (Author) 



* Documents acguired by EHIC include many informal unpublishad * 

* materials not available from other sources* EHIC makes every effort * 

* to obtain the best copy availabla* Nevartheless, items of mar gipal * 

* reproducibility are often encountered and this affects the quality * 
^ of the microfiche and hardcopy reproductions EBIC makes available * 
^ via the EBIC Document Beproduction Service (IDES) * EDBS is not * 
^ responsible for the quality of the original documents aeproductions * 

* supplied by EMS are the best that can be made from the original. * 



Increasitig the Ifficltocy of 



Paul C. Raff eld 
Office of EiraluatlDn and Developmant 
LpulaiEna Health and Himan Resources 



William M, Reynolde 
State University of New York at Albany 



^ /i. 



TO ifliC AND OnaANJZATigNi OPfRATlNQ 
UNDIR AGREtMENTi WJTH THE NATIONAL IN- 
STiTUTi OF EOUCATIQN. FURTMiR REPRO- 
DUCTION OUTSIDE TMi ERIC SViTlM Ri- 
QUIRiS ^RMmeiON OF THE COPYWQMT 
OWHfR," 




"PiPiMlSSlON TO REPROOUCl THIS COP¥^ 
RfQHTeO MATiRJAL HAS BEEN GRANTID BV 




Abstract 



The pretest-posttast design referred to as Design 2 by Campbell and Stanley 
(1963) is CQwmonly used in educational research and evaluation* The tfm^ 
ability of the assumption of a gero population difference commonly ua^d with 
this design is questioned. A non^zero population estimate based on the mean 
difference observed in test-pretest reliability data is recommended ^ thus 
allowing for greater control of some of the factors known to affect Design 2 
results* ^en a control group is available, the authors recommend that the 
pretest^posttest difference for the control group be subtracted from the 
ex'^erlmental group diffarence* This will produce a more accurate estimate of 
the magnitude of change for the arimental group. 



INCREASIKG THE EFFICIENCY OF 
PRETEST--POSTTEST DESIGNS 



The onB^group pretest^posttest deaign^ raferred to by Campbell and 
Stanley (1963) as Design 2, has been criticised for its lack of "validity 
(Campbell St Stanley, 1963i Kerlinger ^ 1973) * One method of reducing the 
number of false inferences concerning the eKlstence of a treatment effect 
when using Design 2, Is to test sample differences against a non-zero popula- 
tion estimate* 

Wx&tL statistically comparing two means ^ the general form of the null 
hypothesis is jij^ ^ ^ ^* spite of the fact that k can be sat to any 
small value considered of practical interest (Winer i 1971)^ the ovarriding 
tendency is to set k to ^ero * the expected value, so that the null hypothesis 
becomes jjl^ ^^2* While there is nothing statistically ^ong with testing 
against an expected valuer it is obvious that such a practice results in 
maximizing the sensitivity of the test to any elgnlf leant differencej regard- 
less of any practical implications. 

Although the authors support the more fraquent use of valuaa greater 
than the eKpected valuies this paper addrassas the issue of underestimating 
the expected value and the consequent increase in detecting invalid slgni- 
flcant dlfferencei. It is in this eense that the word "efficiency" is used 
in the titla rather than In the sensa af statistical power. Actually, using 



a value of k greater than the expected value would decrease the power of the 
test only if the specific alternative hypothesis used against the zero 
ejected value was maintained, and this would not be likely. 

In the case of the t test for independeiit samples ^ there is no debate 
that the eKpected value is zerOj given random sampling and random assignment 
to treatments* However, it is not reasonable to conclude that an eKpected 
value of gero is correct in the case of the t test for correlated samples used 
in Design 2. It is quite cOTmon to find the mean of the second adminietratlon 
of a test to be higher than the mean of the first administration ^ even ovec 
short periods of time and with no deliberate intervention. 

Potential estimates of population differences are available in many 
instances 5 but they are seldom used* One major reason for this is a concern 
over the accuracy of such estimates. It is the author ^s contention^ however, 
that in most instances of pre and post testing, the expected differences will 
not be zero, and that any estimate of k that is greater than zero, but not 
in excess of a practical difference should be used* Wien in doubt, it seems 
better to overestimate rather than underestimate the expected value. 

One estimate of a population difference useful in Design 2 studies can 
be obtained by examining the test - retest means from test - retest reliability 
data. This mean difference is usually Ignored because the reliability co-* 
efficients are typically high. It is easy to forget that the Pearson 
correlation used to obtain test = retest reliability is insensitive to changes 
in the mean* Stability over time In the test - retest sense refers primarily 
to the preservation of relative rank order and does hot reflect a change in 
the mean over time* It is not unusual to have a test ^ retest reliability 
coefficient of * 90 and a difference between the test -- retest means that is 



statistically significant. The authors have found that the typical mean 
post test increase shorn in technical reports for IQ teats and achievement 
batteries is often sigriiflcant beyond the .05 level. These measures are 
usually taken over a 2- to --6 week period mth no intentional treatment 
intervention. 

As note4 earlier^ the reluctance to use such estimates of the expected 
difference reflects concerns related to sampling stability and the cQmparability 
of populations. However ^ , these coneerns are important regardless of the chosen 
value of k. It is obviously inappropriate to administer a test designed and 
nortned on one population to a sample from another population and expect to 
have comparable results. Furthermore^ population estimates based upon small 
heterogeneous samples are not as stable as one would desire* Keeping these 
factors in mind, however, the authors still recoranend using a statistically 
significant test rates t mean difference as the expected value of k, rathet 
than zero, because the chances are that even this value will underestimate 
the expected value more of ten than is desirable. 

It seems reasonable to assume that such factors as history ^ maturation, 
the effect of pretest administration and statistical regression are reflected 
to some extent In these test retest mean differences (Campbell S Stanley* 
1963). The relatively short test ^ - retest intervals used in reliability 
analysis will tend to increase the effect of some of these factors while 
decreasing the effects of the others. By using the test = retest mean . 
difference as suggested here ^ the effects of these confounding variables . 
should be reduced and, therefore, Increase the validity when rejecting the 
null hypothesis. 



4 



The form of the t test for correlated samples ±si 



t ^ 




where ^ g is the expected value and D represents the sample raean difference. 
The standard error is based upon the sample data obtained for the study regards- 
less of the value chosen for ^ g - The recoDomendation for Design 2 studies 
then^ is to use the test - retest mean difference, if available and significant^ 
as the estimate of ji ^ , When test retest data are not available, it may 
still be better to select some arbitrary small value of ji j rather than to 
use zero. 

When a control group is available and Design 2 is unnecessa^, the usual 
approaches to analyzing the data are 1) to compare the raw difference scores 
for the eKperimental and control groups using at test for Independent s^ple 
or 2) to compare the adjusted posttest means of the eKperimental and control 
groups usitig analyses of covariance with the pretest scores serving as the 
€iDvariate, In both of these analyses, one is essentially interested in deter- 
mining whether there Is significantly greater gain in the eKperimental group 
than in the control group. However, neither of these procedures deal directly 
with the actual magnitude of the gain, ^ile it is true that in the case of 
the raw difference score approach the mean difference for each group is avail-* 
able^ this difference for the experimental group is quite likely to be greater 
than it should be as noted earlier. Againj the overestimate is Abased upon a 
null hypothesis of J^j^^^2' Therefore, whran control group data is available 
and a significant difference between the experimental and control groups has 
been found. It may prove advantageous to use at test for correlated samples 



and analyze the gain In the eKperlmantal groups using the control group mean 
difference as g. If proper sailing and assignment procedures are 
employed, this analysis should reduce the artificial gain attributed to non- 
treatment factors* If the accuracy of the estlffiated control group mean 
difference is seriously questioned^ then one could establish 95 percent 
confidence limits for the estmate and select the lower lb ound. 



Ref erencaa 



Campbells T*j S Stanley » J, G* Experimental and quasl^axperlmental designs 
^^^^sesrch* Chicago i Eand McNally, 1963, 

Kerllnger^ F. FQundatlons of behavioral research (2nd ed») New Yorkr Holt, 
Rinfthart and Winston^ Inc. , 1973, 

Wlner^ B. Statistical principles in experimental design (2nd ed.). New 

York I McGraw-Hill Book Company^ 1971, P 




